Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-19.2: kv/kvserver: use LAI from before split when setting initialMaxClosed #46154

Merged

Conversation

nvanbenschoten
Copy link
Member

Backport 1/2 commits from #46085.

/cc @cockroachdb/release


Addresses 1 of 2 bugs in #44878.
Skews with #46017, so I'll let that go in first.

Before this change, TestKVNemesisMultiNode with splits enabled and run with 3x increased frequency failed every ~2 minutes under roachprod-stress. After this change, I have yet to see it fail after 20 40 minutes.

The initialMaxClosed is assigned to the RHS replica to ensure that follower reads do not regress following the split. After the split occurs there will be no information in the closedts subsystem about the newly minted RHS range from its leaseholder's store. Furthermore, the RHS will have a lease start time equal to that of the LHS which might be quite old. This means that timestamps which follow the least StartTime for the LHS part are below the current closed timestamp for the LHS would no longer be readable on the RHS after the split.

It is necessary for correctness that the call to maxClosed used to determine the current closed timestamp happens during the splitPreApply so that it uses a LAI that is before the index at which this split is applied. If it were to refer to a LAI equal to or after the split then the value of initialMaxClosed might be unsafe.

Concretely, any closed timestamp based on an LAI that is equal to or above the split index might be larger than the initial closed timestamp assigned to the RHS range's initial leaseholder. This is because the LHS range's leaseholder could continue closing out timestamps at the split's LAI after applying the split. Slow followers in that range could hear about these closed timestamp notifications before applying the split themselves. If these slow followers were allowed to pass these closed timestamps created after the split to the RHS replicas they create during the application of the split then these RHS replicas might end up with initialMaxClosed values above their current range's official closed timestamp. The leaseholder of the RHS range could then propose a write at a timestamp below this initialMaxClosed, violating the closed timestamp system's most important property.

Using an LAI from before the index at which this split is applied avoids the hazard and ensures that no replica on the RHS is created with an initialMaxClosed that could be violated by a proposal on the RHS's initial leaseholder. See #44878.

Release node (bug fix): fixed a bug which could cause CDC to violate its resolved timestamp guarantee. To some observers, this could look like data loss in CDC.

Addresses 1 of 2 bugs in cockroachdb#44878.

Before this change, `TestKVNemesisMultiNode` with splits enabled and run
with 3x increased frequency failed every ~2 minutes under stress. After
this change, I have yet to see it fail after 20 minutes.

The initialMaxClosed is assigned to the RHS replica to ensure that
follower reads do not regress following the split. After the split occurs
there will be no information in the closedts subsystem about the newly
minted RHS range from its leaseholder's store. Furthermore, the RHS will
have a lease start time equal to that of the LHS which might be quite
old. This means that timestamps which follow the least StartTime for the
LHS part are below the current closed timestamp for the LHS would no
longer be readable on the RHS after the split.

It is necessary for correctness that the call to maxClosed used to
determine the current closed timestamp happens during the splitPreApply
so that it uses a LAI that is _before_ the index at which this split is
applied. If it were to refer to a LAI equal to or after the split then
the value of initialMaxClosed might be unsafe.

Concretely, any closed timestamp based on an LAI that is equal to or
above the split index might be larger than the initial closed timestamp
assigned to the RHS range's initial leaseholder. This is because the LHS
range's leaseholder could continue closing out timestamps at the split's
LAI after applying the split. Slow followers in that range could hear
about these closed timestamp notifications before applying the split
themselves. If these slow followers were allowed to pass these closed
timestamps created after the split to the RHS replicas they create during
the application of the split then these RHS replicas might end up with
initialMaxClosed values above their current range's official closed
timestamp. The leaseholder of the RHS range could then propose a write at
a timestamp below this initialMaxClosed, violating the closed timestamp
system's most important property.

Using an LAI from before the index at which this split is applied avoids
the hazard and ensures that no replica on the RHS is created with an
initialMaxClosed that could be violated by a proposal on the RHS's
initial leaseholder. See cockroachdb#44878.

Release node (bug fix): fixed a bug which could cause CDC to violate
its resolved timestamp guarantee. To some observers, this could look
like data loss in CDC.
@nvanbenschoten nvanbenschoten requested a review from ajwerner March 16, 2020 16:20
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

@nvanbenschoten nvanbenschoten merged commit 4f36d0c into cockroachdb:release-19.2 Mar 16, 2020
@nvanbenschoten nvanbenschoten deleted the backport19.2-46085 branch March 16, 2020 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants